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ABSTRACT 


Linear  prediction  is  presented  as  a  spectral  modeling  tech¬ 
nique  in  which  the  signal  spectrum  is  modeled  by  an  all-pole 
spectrum.  The  method  allows  for  arbitrary  spectral  shaping  in 
the  frequency  domain,  and  for  modeling  of  continuous  as  well  as 
discrete  spectra  (such  as  filter  bank  spectra) .  In  addition, 
using  the  method  of  sel^tive  linger  prediction,  ail-pole  modeling 
is  applied  to  selected  portions  of  the  spectrum,  with  applications 
to  speech  recognition  and  speech  compression.  Linear  prediction 
is  compared  with  traditional  analysis-by-synthesis  techniques 
for  spectral  modeling.  It  is  found  that  linear  prediction  offers 
computational  advantages  over  analysis-by-synthesis,  as  well  as 
better  modeling  properties  if  the  variations  of  the  signal  spec¬ 
trum  from  the  desired  spectral  model  are  large.  For  relatively 
smooth  spectra  and  for  filter  bank  spectra,  analysis-by-synthesis 
is  judged  to  give  better  results.  Finally,  a  suboptimal  solution 

to  the  problem  of  all-zero  modeling  using  linear  prediction  is 
given 
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I.  INTRODUCTION 

The  short-time  spectrum  has  been  perhaps  the  single  most 
important  method  of  analysis  for  the  study  of  speech.  Its 
applications  in  speech  synthesis,  speech  recognition  and  spea¬ 
ker  identification  are  pervasive  and  well  known.  The  extensive 
use  of  the  short-time  spectrum  as  an  analysis  tool  began  with 
the  development  of  the  sound  spectrograph  [1],  Even  today,  this 
three-dimensional  time-frequency-intensity  spectral  representa¬ 
tion  is  of  great  utility.  However,  there  are  obvious  limita¬ 
tions  on  the  range  and  flexibility  of  the  types  of  analysis 
that  can  be  performed,  as  well  as  limitations  in  the  resolution 
and  dynamic  range  of  the  output  spectrogram  display. 

Many  of  the  limitations  of  the  spectrograph  were  overcome 
upon  the  introduction,  in  the  1950 's,  of  high-speed  digital 
computers  in  spectral  analysis.  Simultaneous  with  the  advance 
in  computation  there  were  significant  advances  that  occurred 
in  understanding  the  acoustics  of  speech  production.  This  was 
highlighted  in  1960  by  the  publication  of  Fant’s  Acoustic 
Theory  of  Speech  Production  [2] .  As  a  result  of  the  two  types 
of  advances  mentioned  above,  the  method  of  spectral  analysis-by¬ 
synthesis  (AbS)  for  the  reduction  of  speech  spectra  was  intro¬ 
duced  at  M.I.T.  and  Bell  Laboratories  in  1961.  At  M.I.T.  the 
method  was  used  on  filter-bank  derived  spectra  to  extract  the 
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pole  pattern  of  vowels  13,4]  and  pole-zero  patterns  of  nasals 
[5].  At  Bell  Laboratories,  analysis-by-synthesis  was  used  on 
the  computed  spectrum  of  a  single  pitch  period  to  extract  the 
formants  (resonances,  poles)  of  the  vocal  tract  as  well  as  the 
zeros  of  the  glottal  spectrum  16]. 

In  spectral  AbS,  a  speech  spectrum  is  fitted  by  another 
spectrum  that  is  represented  in  terms  of  poles  and  zeros.  The 
fit  is  optimized  through  the  minimization  of  some  error  criterion. 
The  error  between  the  two  log  spectra  is  minimized  in  an 
iterative  manner.  The  early  attempts  minimized  the  error  by 
recursively  varying  only  one  pole  or  zero  at  a  time.  These  me¬ 
thods  were  error  prone  and  were  not  easily  adaptable  to  an  au¬ 
tomatic  algorithm.  More  recently,  Olive  [7]  developed  a  Newton- 
Raphson  technigue  that  performs  the  iterative  computation  on  all 
poles  simultaneously  and  in  a  straightforward  automatic  manner. 

In  this  paper  we  present  another  method  of  spectral  modeling 
which  makes  use  of  recent  advances  in  the  field  of  digital  signal 
processing,  in  particular  the  introduction  of  linear  prediction 
(LP)  to  speech  analysis.  The  major  difference  between  AbS  and 
LP  analysis  is  the  error  criterion  used  in  the  matching  process, 
which  in  the  latter  is  the  integrated  ratio  of  the  two  spectra. 

In  general,  this  error  criterion  leads  to  a  better  spectral 
envelope  fit.  In  addition,  for  the  special  (but  important)  case 
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of  an  all-pole  model  spectrum,  LP  analysis  offers  two  important 
advantages:  (a)  The  computations  for  the  spectral  parameters 

are  straightforward  and  noniterative,  and  (b)  if  the  time  signal 
is  available  there  is  no  need  to  compute  the  spectrum  first. 

The  two  methods  have  two  properties  in  common:  (a)  The  spectral 
matching  can  be  done  selectively  to  any  portion  of  the  spectrum, 
and  (b)  both  error  criteria  are  functions  of  the  ratio  of  the 
original  and  model  spectra,  thereby  resulting  in  a  matching 
Process  that  is  uniform  over  the  frequency  range  of  interest. 

In  Section  II  we  apply  LP  analysis  to  spectral  matching 
by  all-pole  model  spectra.  Section  III  describes  the  properties 
of  the  optimum  model  spectrum.  In  Section  IV  we  introduce  the 
method  of  selective  linear  prediction,  where  LP  analysis  is 
applied  to  a  selected  portion  of  the  spectrum,  and  we  describe 
its  applications  to  speech  recognition  and  speech  compression. 
Section  V  describes  the  application  of  LP  analysis  to  the  modeling 
of  discrete  spectra  (such  as  harmonic  spectra  and  those  obtained 
from  a  bank  of  filters) .  Section  VI  examines  the  properties  of 
the  error  measure  used  in  LP  analysis  and  gives  a  critical  com¬ 
parison  between  LP  analysis  and  analysis-by-synthesis.  Section  VII 
gives  a  suboptimal  solution  to  the  problem  of  all-zero  modeling 
using  LP  analysis. 
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II.  LI  .'EAR  PREDICTIVE  SPECTRAL  MODELING 

Let  us  assume  that  we  are  given  a  power  spectrum  P(ui)  with 

bandwidth  B,  i.e.  P(co)  is  known  for  0lai5tu  =2nB.  (The  more  general 

b 

case  where  the  frequency  range  covers  only  a  portion  of  a  spectrum 
is  treated  later.)  In  this  method,  we  shall  view  P(uj)  as  the 
spectrum  of  some  signal  s(nT)  that  was  sharply  low-pass  filtered 
at  B  Hz  and  sampled  at  a  frequency  f  =2B=— where  T  is  the 
sampling  period.  We  shall  view  P(w)  as  such  irrespective  of  how 
it  was  actually  generated.  This  now  allows  us  to  deal  with  P(w) 
as  the  spectrum  of  a  sampled  signal  and,  hence,  we  can  make  use 
of  digital  signal  processing  techniques.  In  particular,  instead 
of  using  the  complex  s  plane  we  now  use  the  complex  z  plane.  In 
essence,  we  map  P(w)  onto  the  upper  half  of  the  unit  circle  in 
the  z  plane  such  that  the  angular  distance  0=wT.  The  mapping  is 
such  that  o)=0  corresponds  to  0=0  and  corresponds  to  8=tt. 

l‘n  addition  P(-u))=P(oo)  defines  the  spectrum  over  the  bottom  half 
of  the  unit  circle,  i.e.  the  spectrum  is  even  and  real.  (For 
convenience,  we  shall  set  the  sampling  interval  T=1 .  For  other 
values  of  T  simply  replace  ui  by  uiT  in  the  appropriate  equations.) 
Thus,  we  shall  assume  that 

P(w)  -  |S(ejai)  |2  ,  (1) 

where  S(z)  is  the  z  transform  of  the  hypothetical  signal  sn> 
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*  m 


We  wish  to  fit  P{<d)  in  some  optimal  manner  by  an  all-pole 

a 

spectrum  P  ( to )  .  Let  us  assume  that  the  mode},  spectrum  corresponds 

a 

to  a  transfer  function  S(z)  given  by 


S  (z) 


G 

A  (z) 


1  + 


P 

I 

k=l 


ak  2 


-k 


(2) 


where 


A (z)  *  1  + 


P 

2 

k=l 


ak  2 


-k 


(3) 


will  be  called  the  inverse  filter,  p  is  the  number  of  poles 
in  the  model  spectrum,  and  G  is  a  constant  gain  factor.  The  model 

a 

spectrum  P(u))  is  then  given  by 

P(w)  =  |s(eju>)|2  *  - ^ - 

|A(e^)|2 


1  ♦  l  ake^^i2 

k=l  K 


(4) 


Given  a  spectrum  P (u)  and  a  number  of  poles  p.  we  must  determine 
the  parameters  {ak,  l<ksp}  and  G. 

We  define  an  error  measure  E  between  P (w)  and  P(u): 


/  Stei  a. 

-7T  P(U>) 


-j?  /  P(w)  |A(eja))  [2  du 


“TT 


(5) 


(6) 
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E  can  be  interpreted  as  the  total  energy  of  the  "error  signal- 
obtained  by  passing  the  hypothetical  signal  sn  through  the  in¬ 
verse  filter  A(z).  (This  is  clear  by  using  Parseval's  theorem.) 
Note  from  (6)  that  E  is  defined  to  be  independent  of  G.  The 
gain  factor  is  determined  from  energy  considerations. 

The  parameters  (ak)  are  determined  by  minimizing  E  in  (6) 

with  respect  to  each  of  the  parameters.  This  is  accomplished  by 
setting 


3E 

3a. 


=  0,  l<i<p  . 


From  (4-6)  it  can  be  shown  that  [8] 


3E 

9a 


r2h  +  Ji  ^-*1] . 


where 


Rk  ~  2 7  *  P^)  c°s(kw)  dw 

-7T 


(7) 

(8) 

(9) 


is  the  autocorrelation  function  corresponding  to  the  signal  spec 
trum  P(u).  From  (7)  and  (8)  we  must  have 


P 

k=l  a*  R | i-k |  =  "V  1-i-P  •  (10) 

This  is  a  set  of  p  linear  equations  in  p  unknowns  which  can  be 
solved  for  the  parameters  {ak>  of  the  all-pole  model  spectrum. 
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A  recursive  solution  is  given  elsewhere  [8,15,16]. 

The  minimum  error  is  obtained  by  substituting  (9)  and  (10) 
in  (6) .  The  result  can  be  shown  to  be 

P 

E„  -  R0  +  Z  a.  R.  ,  (11) 

P  0  k-1  K  K 

where  the  dependence  of  the  minimum  error  on  p  is  shown  explicitly. 

2 

The  gain  factor  G  in  (4)  is  obtained  by  conserving  energy 
between  the  original  and  model  spectra,  i.e. 

1  TT  i  TT 

2^-  /  P(w)  dw  «  f  P  (w)  du) 

-TT  -TT 

A 

or  R0  -  RQ  r 

*  - 

where  R .  -  1  /  P (w)  cos  (iu>)  dw 

1  Tn  -TT 


(12) 

(13) 


is  the  autocorrelation  function  corresponding  to  the  model  spec¬ 
trum.  An  analytic  expression  for  iL  is  more  easily  obtained 

A  * 

from  the  unit  sample  response  sn  of  S(z)  in  (2).  Takiny  the  in¬ 
verse  z  transform  of  (2)  we  have 


n 


0  ,  n<0, 
G  ,  n*0, 
P 


^  ak  sn-k  '  n>0* 


(14) 
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By  definitlon>  the  autocorrelati 


i°n  function  r. 


i  is  given  by 


1  n=—  Sn  Sn+|i|  . 

From  (14)  and  (l5j  ^ 

1  1T:  Can  be  shown  that 


(15) 


R 


2  P 

o  '  G  “  2  a,  r 

k=l  k  k 


and 


R  .  =  -T  -  n 

1  kii  ak  Ri i-ki  '  isiii<.  . 

Tr°m  <10)'  <12)'  (U)  and  <17>-  we  conclude  that 
Ri  =  Ri  '  °si<p  , 


(16) 


(17) 


and 


r2  P 

G  =  R„  +  Z  a  o 

0  k-1  k  Rk  • 


Therefore,  from  (11)  and  (19),  G2 


V 


is  equal  to  the 


(13) 

(19) 

minimum  error 


Equations  (10)  and  mqi  „ 

trum  5,u,  Given  .  1  COmPlet6ly  SPeCi£^  -del  spec. 

p  the  SPOCtrUIn  and  3  dGSired  nUmber  of  poles 

p'  the  Parameters  of  p  (an  u.  .  P  S 

uj.  i  are  obtained  hv 

autocorrelation  coefficients  r.  ,  ....  .  "" 

cients  ;a  i  a  i'  -  -P^  using  (9).  The  coeffi- 

k  re  then  computed  from  (10)  and  th_ 

)  and  the  gam  G  from  (19)  . 

Equivalently,  if  the  speech 

not  necessa-v  to  o  91Ven'  lt  is 

essa.y  to  compute  P(an  fir«,a-  T 

InStead'  «»  autocorrelation 
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coefficients  can  be  computed  from  the  signal  directly: 

Ri  =  nl_m  sn  3n+ 1 i  |  '  °-i-P  •  (20) 

It  is  clear  that  (2J)  can  be  evaluated  only  if  the  signal  is  of 
finite  duration.  This  usually  brings  up  the  issue  of  windowing. 
(See  Makhoux  and  Wolf  [8]  for  a  discussion  of  windowing  of  speech 
signals . ) 

The  spectral  fitting  method  described  in  this  section  can 
oe  shown  to  be  equivalent  to  the  autocorrelation  method  of 
linear  prediction  [8,9],  where  the  coefficients  a,  are  the  pre- 
dictor  coefficients.  That  is  why  we  have  chosen  to  call  this 
method  the  linear  predictive  (LP)  spectral  modeling  method.  The 

A 

model  spectrum  P(w)  is  also  known  as  the  LP  spectrum. 

Figure  1  shows  an  example  of  LP  spect.al  matching  for  a 
spectrum  over  0-10  kHz  with  the  number  of  poles  p=28.  In  this 
case  the  original  spectrum  P(w)  was  obtained  by  computing  the 
fast  Fourier  transform  (FFT)  of  a  20  ms,  Hamming  windowed,  20 
kHz  sampled  speech  signal.  The  spectrum  P(w)  was  computed 
from  (4)  by  dividing  G  by  the  magnitude  squared  of  the  FFT  of 
the  sequence;  1,  a^,  a2»  ...,  a^.  Arbitrary  frequency  resolu¬ 
tion  can  be  obtained  by  simply  appending  an  appropriate  number  of 
zeros  to  this  sequence  before  taking  the  F~T. 


*o> 
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Before  we  turn  to  the  general  case  of  selective  spectral 


matching,  we  shall  examine 

(w) . 


the  properties  of  the  model  spectrum 
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III.  PROPERTIES  OF  THE  MODEL  SPECTRUM 

The  poles  of  the  model  spectrum  can  be  found  by  computing 

the  roots  of  the  polynomial  A{z)  in  (3).  Since  the  coefficients 

ak  are  rea^'  some  or  none  of  the  roots  are  real  and  the  rest  are 

complex  conjugate  pairs.  Conversion  of  the  poles  to  the  s  plane 

can  be  achieved  by  setting  each  root  zk=eSkT  ,  where  sk“°k+jwk  is 

the  corresponding  pole  in  the  s  plane.  If  the  root  z, ®z,  +iz  . 

k  kr  J  ki, 

then: 

wk  =  arctan  /  (21a) 

kr 

°k  '  25  lo9(*kr+2ki>  '  <21*» 

where  zkr  and  zki  are  the  real  and  imaginary  parts  of  zfc  ,  re¬ 
spectively,  and  T  is  the  sampling  period. 

One  important  property  of  the  poles  of  S(z)  is  that  they  are 
guaranteed  to  be  inside  the  unit  circle,  provided  P(u>)  is  a  posi¬ 
tive  definite  spectrum  (10] . 

For  a  well  chosen  number  of  poles  p,  some  of  the  poles  of 

A 

S(z)  can  be  related  to  vocal  tract  resonances.  The  extent  to 
which  the  formant  values  thus  obtained  reflect  the  actual  reso¬ 
nances  of  the  vocal  tract  depends  on  several  factors,  including 
the  adequacy  of  the  all-pole  model  for  each  spectrum  considered, 
and  the  number  of  poles  in  the  model.  These  issues  are 
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discussed  in  more  detail  elsewhere  [ 8 ] . 


The  manner  in  which  the  model  spectrum  P  (w)  approximates 
P(w)  is  reflected  largely  in  (18),  which  relates  the  autocorrela- 

A 

tion  coefficients  and  R^.  Since  P  ( to )  and  P(u>)  are  the  Fourier 
transforms  of  Ri  and  R^  respectively,  it  follows  that  increasing 
the  value  of  p  increases  the  range  over  which  R^  and  R^  are  equal, 
resulting  in  a  better  fit  of  P(w)  to  P(w).  In  the  limit,  as 

A 

P*“ ",  Ri  becomes  identical  to  Ri  for  all  i,  and  hence  the  two 
power  spectra  become  identical: 


P(w)  =  P  (u)  ,  as  p-**° 


(22) 


Since  the  minimum  error  Ep  =  ,  we  have  from  (5): 


1  r  P  (w)  j  , „ 

2tt  /  7-7  d(i)  ■  1  • 

-TT  P  (U) 


(23) 


Equation  (23)  is  true  for  all  values  of  p.  In  particular,  it  is 
true  as  p-*-~  ,  in  which  case  from  (22)  we  see  that  (23)  becomes 
an  identity.  Another  important  case  where  (23)  becomes  an  iden¬ 
tity  is  when  P(w)  is  an  all-pole  spectrum  with  p  poles,  then 

A 

p(“)  wiH  be  identical  to  P(w)  for  all  P-PQ.  Relation  (23)  will 
be  useful  in  discussing  the  properties  of  the  error  measure  in 
Section  vi . 


13 
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Another  property  of  P(u)  that  will  be  discussed  later  on 
is  that  the  slope  of  P  (ui )  is  zero  at  w=0  and  w=tt: 

3  P  ( oj )  „ 

Dw  0  '  ua°'1  •  (24) 


This  can  be  easily  seen  by  rewriting  (4)  as 


bn  +  2  Z  b.cos(kw) 
k=l  K 


(25) 


where 


P"|k  j 


k  an  an+|k|  '  ao  * '  0-ksp  , 


are  the  autocorrelation  coefficients  of  the  impulse  response  of 
the  inverse  filter  A(z).  By  taking  i|i“L  in  (25),  it  is  clear 
that  it  is  equal  to  zero  at  0  and  tt. 

Equation  (25)  gives  another  method  for  computing  P(tu),  and 

that  is  by  dividing  G2  by  the  real  part  of  the  FFT  of  the  sequence 
bQ,  2br  2b2,...,  2bp. 
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IV.  SELECTIVE  LINEAR  PREDICTION 

We  now  generalize  the  LP  spectral  modeling  method  to  the 

case  where  we  wish  to  fit  a  selected  portion  of  a  given  spec- 
trum. 


In  general,  we  have  a  spectrum  P(„),  05u5Ub  ,  and  we  wish 
to  match  the  spectrum  in  a  region  <S:  by  an  all-pole 

spectrum  P  (u.)  as  given  by  (4).  Call  the  spectrum  in  the  region 
2,  P' <w).  In  order  to  compute  the  parameters  of  £(„)  „e  simply 
map  the  region  a  onto  the  unit  circle  such  that  (the  arrow 

is  read  "mapped  into")  and  y,,  and  then  follow  the  same  proce¬ 
dure  outlined  in  Section  II.  The  mapping  is  done  as  follows: 

(1)  ’  —  U)  —  U 

a 

“b  =  V  "a  • 

2  OSlD 1  <a)' 

b  (27) 

fp  I  _  ^ 

a  new  hypothetical  sampling  interval.  The  problem 
to  the  original  one.  The  autocorrelation  coefii- 
02k<p,  are  computed  from  (9)  with  P '  (<d  ')  replacing 
(10)  and  (19)  are  used  to  solve  for  the  parameters 

A 

of  the  model  spectrum  P(u)'). 


Define 

and 

Then 

and 

where  T'  is 
now  reduces 
cients  Rk  , 
P(w).  Then 
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Figure  2  shows  an  example  of  selective  linear  prediction. 

The  signal  spectrum  is  identical  to  that  shown  in  Figure  1.  In 

Figure  2  the  two  halves  of  the  spectrum  were  matched  separately 

by  a  14-pole  model  spectrum  in  each  half.  In  the  left  half  u  »0 

a 

and  o)p=5  kHz ,  and  in  the  right  half  w  =5  kHz  and  «o-10  kHz. 
Since  the  matching  for  each  half  was  done  independently,  there 
is  no  guarantee  that  the  two  model  spectra  will  join  smoothly 
at  5  kHz.  In  fact,  in  general,  a  discontinuity  such  as  the  one 
in  Figure  2  is  expected.  Recall  that  the  model  spectrum  has 
zero  slope  at  0  and  ir .  This  is  evident  in  Figure  2  at  5  kHz. 
xhe  reader  wii:  also  note  other  differences  between  Figs.  1  and  2 
in  the  manner  in  which  the  original  and  model  spectra  match. 

Figure  3  shows  the  same  signal  spectrum  as  in  Figure  2, 
but  with  the  right  half  of  the  spectrum  being  fitted  by  only  a 
5-pole  spectrum.  This  demonstrates  the  flexibility  of  selective 
linear  prediction  in  that  different  portions  of  a  spectrum  can  be 
matched  using  different  numbers  of  poles. 

Applications  to  Speech  Recognition  and  Compression 

Here  we  shall  demonstrate  the  idea  of  selective  linear  pre¬ 
diction  as  applied  to  speech  recognition  and  speech  compression. 
It  is  important  to  note  that,  since  we  assume  the  availability  of 
the  signal  spectrum  P(w),  any  desired  frequency  shaping  or 
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filtering  can  be  done  directly  to  the  signal  spectrum  before 
LP  analysis  is  performed.  This  will  be  clear  below. 

Figures  1-3  show  the  same  signal  spectrum  computed  from  a 
20  kHz  sampled  signal.  Now,  in  order  to  model  the  spectral  en¬ 
velope  for  the  whole  frequency  range  from  0-10  kHz,  one  would 
probably  use  anywhere  between  24-28  poles  for  the  all-pole  model 
spectrum.  A  28-polo  fit  is  shown  in  Fig.  1.  For  speech  recogni¬ 
tion  applications,  however,  the  main  region  of  interest  is  the 
0-5  kHz  region.  The  spectrum  in  the  5-10  kHz  region  is  of  inter¬ 
est  mainly  for  the  recognition  of  fricatives,  in  which  case  the 
total  energy  in  that  region  might  be  sufficient.  We  also  know 
that  in  LP  analysis  the  spectral  matching  process  performs  uni¬ 
formly  over  the  whole  frequency  range,  which  might  not  be  de¬ 
sirable  in  this  case  because  the  all-pole  assumption  for  many 
speech  sounds  is  less  applicable  for  frequencies  greater  than 
5  kHz.  Therefore,  instead  of  modeling  the  whole  spectrum,  we 
use  selective  LP  to  model  the  lower  5  kHz  by  a  lower  order  all¬ 
pole  spectrum.  A  14-pole  fit  is  shown  in  Figs.  2  and  3.  In  this 
manner,  net  only  do  we  reduce  our  computations  for  the  poles,  but 
we  are  also  in  the  advantageous  position  of  having  to  interpret 
14  instead  of  28  poles.  The  total  energy  in  the  5-10  kHz  reqion 
can  be  easily  computed  directly  from  the  spectrum  and  used  for 
the  detection  of  fricatives  if  desired.  Alternative!:  one  could 
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fit  a  very  low  order  all-pole  spectrum  to  that  region,  as  shown 
in  I'iq.  3. 

Now,  the  same  type  of  analysis  could  have  been  done  in  the 
time  domain,  but  consider  what  one  would  have  had  to  do.  First, 
the  20  kHz  sampled  signal  must  be  sharply  filtered  at  5  kHz. 
Second,  and  very  importantly,  the  signal  must  be  down-sampled  to 
10  kHz  by  discarding  every  other  sample.  Third,  a  14-pole  LP 
analysis  is  performed  on  the  resulting  signal.  And  fourth,  in 
order  to  obtain  the  energy  in  the  5-10  kHz  region,  one  subtracts 
the  enargy  in  the  10  kHz  signal  from  the  energy  in  the  original 
20  kHz  signal.  (It  is  even  more  complicated  if  one  wants  to  per¬ 
form  an  LP  analysis  on  the  5-10  kHz  region  in  the  time  domain.) 

Not  only  is  the  time  domain  analysis  more  involved  and  costly; 
it  is  also  very  inflexible.  Consider  the  problem  of  having  to 
carry  the  same  procedure  to  match  the  spectrum  in  the  0-3.5  kHz 
region  instead  of  0-5  kHz.  In  that  case,  it  would  be  necessary 
to  perform  the  time-domain  down-sampling  from  20  kHz  to  7  kHz: 
a  rather  difficult  task.  The  elegance  of  the  method  of  selective 
linear  prediction  lies  in  the  fact  that  the  two  problems  of  sharp 
filtering  and  down  sampling  are  completely  solved  by  working  in 
the  frequency  domain. 
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We  are  currently  applying  this  property  to  speech  com¬ 
pression  systems  that  employ  linear  prediction,  in  this  appli¬ 
cation,  it  is  desirable  to  be  able  to  test  the  performance  of  the 
system  at  different  sampling  rates.  We  sample  the  signal  at  the 
highest  sampling  rate  desired,  and  then  we  simulate  the  perfor¬ 
mance  of  different  sampling  rates  by  applying  selective  linear 
prediction  to  the  corresponding  frequency  bands. 
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V.  MODELING  DISCRETE  SPECTRA 

Thus  far  we  have  assumed  that  the  spectrum  p(w)  is  a  con¬ 
tinuous  function  of  frequency.  Most  often,  however,  the  spectrum 
is  known  at  only  a  finite  number  of  frequencies.  For  example, 
an  FFT-derived  spectrum  has  values  at  equally  spaced  frequency 
points.  On  the  other  hand,  filter  bank  spectra  usually  have 
values  at  frequencies  that  are  not  necessarily  equally  spaced. 

For  these  discrete  cases  we  define  the  error  measure  E  as  a  sum¬ 
mation  instead  of  an  integral: 

2  N-l  p(u>  ) 

E  “  2  i~-  '  (28) 

n=0  P(wn) 

where  N  is  the  total  number  of  spectral  points  on  the  unit  circle. 
Following  the  same  minimization  procedure  as  in  the  continuous 
case,  we  obtain  the  set  of  equations  (10)  again,  but  the  coeffi¬ 
cients  are  now  defined  as 


Rk  N  2  c°s  (kw  )  . 

n=U 


Note  that  in  (28),  only  values  of  P(w)  at  the  frequencies  w 

n 

contribute  to  the  total  error.  Therefore,  after  P(u)  is  obtained, 
the  error  between  P(w)  and  P(w)  is  minimum  at  the  frequencies 
wn  ,  0<n<N-l.  At  other  frequencies,  P(u>)  cannot  be  guaranteed 


.■NJW 


'  kJl  » 
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in  any  way  except  in  that  it  ia  a  smooth  function  of  frequency 
as  given  by  (4)  . 

If  the  spectrum  is  known  at  equally  spaced  frequency  points, 
then  if  desired,  (29)  can  be  computed  via  a  fast  Fourier  trans¬ 
form  (FFT)  of  the  spectrum  P<%).  (m  that  case  a  hiqhly  com¬ 
posite  value  of  N  would  help.,  Howe,er.  if  the  spectrum  fUt  )  is 
known  at  frequencies  that  are  not  equally  spaceo,  then  one  can 
define  a  new  spectrum  Q (»m)  at  equally  spaced  frequencies  such 
that  Q(%)  ,  P(V  at  every  %  ,  and  is  zero  otherwise.  One  can 
then  use  an  FFT  on  QfmJ  to  compute  V  „e  do  not  necessariry 
recommend  the  use  of  the  method  just  outlined  for  cases  where 
the  frequency  spacinq  is  nonuniform,  because  very  often  it  is 
Simply  faster  to  compute  (29)  directly.  However,  we  wished  to 
make  the  point  that  addinq  spectral  values  that  are  zero  does 
not  affect  the  error  minimization  process  in  any  way,  since 

those  values  do  not  contribute  to  the  total  error,  as  is  clear 
from  (28). 


Computational  Considerations 

The  solution  for  the  predictor  coefficients  ak  in  (10)  is  un 
affected  if  each  of  the  autocorrelation  coefficients  is  multi¬ 
plied  or  divided  by  a  constant.  Therefore,  the  division  by  n 
in  (29)  is  unnecessary  to  obtain  the  desired  solution  of  (10). 
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The  only  possible  importance  of  the  division  by  N  (or  some  other 
number)  is  to  get  a  good  estimate  of  the  total  energy  r  what 
number  to  divide  with  depends  on  how  the  signal  spectrum  was 
obtained  and  on  the  particular  application. 


The  spectrum  P(u)n)  is  an  even  function  of  frequency,  i.e. 
P^uN-n^  *  P^wn^‘  Usually  what  we  have  is  a  spectrum  that  we  map 
onto  the  unit  circle,  as  explained  in  Section  IV.  The  evenness 
property  is  then  applied  in  order  to  complete  the  definition  of 
the  spectrum  around  the  unit  circle.  The  mapping  in  the  conti¬ 
nuous  frequency  case  is  no  problem.  However,  there  are  a  few 
matters  to  worry  about  in  the  discrete  case.  The  main  problem 
is  the  relation  of  the  frequencies  u>a  and  w  in  (27)  to  the 
discrete  frequencies  V  There  is  a  total  of  four  possible  cases 
which  are  divided  in  two  categories : 

(a)  N  even 

(1)  “>o  -  0  ,  WN/2  -  IT  . 

(2)  None  of  the  frequencies  wn  correspond  to  either  0  or 


(b)  N  odd 


(1)  u>0  -*■  0  . 

(2)  -  tt 
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The  four  cases  are  illustrated  in  Fig.  4,  where  the  crosses  on 
the  unit  circle  correspond  to  the  frequencies  u>n.  Case  (al)  is 
the  one  usually  encountered  in  FFT-derived  spectra  with  even  N. 
Case  (a2)  is  usually  encountered  with  filter  bank  spectra.  Note 
that,  because  of  the  evenness  property  of  P(w  ),  (29)  can  be 

simplified,  but  in  a  slightly  different  manner  for  each  of  the 
four  cases. 

Application  to  Filter  Bank  Spectra 

We  simulated  the  output  of  a  filter  bank  by  simply  adding 
the  energy  in  specified  frequency  bands  from  an  FFT-derived 
spectrum.  The  resulting  simulated  filter  bank  has  center  fre¬ 
quencies  and  bandwidths  similar  to  the  hardware  filter  bank  at 
the  Speech  Communication  Laboratory  at  M.I.T.  The  filters  are 
linearly  spaced  up  to  1.6  kHz  and  logarithmically  spaced  there¬ 
after.  Figures  5  and  6  show  two  examples  of  the  application  of 
LP  spectral  modeling  to  the  outputs  of  the  simulated  filter  bank. 
In  each  figure,  the  original  spectrum  and  the  corresponding  simu¬ 
lated  filter  bank  spectrum  are  shown  along  with  a  14-pole  LP 
spectrum  in  each  case.  (The  spectral  lines  in  the  filter  bank 
spectra  are  shown  with  a  finite  width  only  because  of  the  manner 
in  which  they  were  plotted.)  The  filter  bank  LP  spectra  in 
Figs.  5b  and  6b  are  quite  similar  to  those  in  Figs.  5a  and  6a, 


Four  possible  configurations  for  discrete  spectra, 
cross  represents  one  of  the  N  spectral  lines  in  the 
spectrum. 
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Fig.  5.  Application  of  LP  modeling  to  a  filter  bank  vowel  spectrum. 

(A)  A  14-pole  fit  to  the  original  spectrum. 

(B)  A  14-pole  fit  to  the  simulated  filter  bank  spectrum. 
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Fig.  6.  Application  of  LP  modeling  to  a  filter  bank  fricative 

A  14-pole  fit  to  the  original  specttum? 

(B)  A  14-pole  fit  to  the  simulated  filter  bank  s—ctrum. 
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in  spite  of  the  relatively  few  spectral  points  in  the  filter 
bank  spectra,  especially  at  high  frequencies.  The  extra  peak  at 
low  frequencies  in  Fig.  6b  is  due  to  the  lack  of  spectral  points 
at  frequencies  less  than  150  Hz. 


Spectra  of  Periodic  Signals 

We  have  seen  in  Section  III  that  if  the  signal  spectrum  P(w) 
consists  of  pQ  poles  only,  then  for  p=po  the  LP  spectrum  P(w)  is 

identical  to  P(w).  The  situat  on  is  not  so  favorable  for  discrete 
signal  spectra,  as  we  shall  see  below. 


Let  us  assume  that  we  are  given  a  discrete  spectrum  p  (w) 
that  has  values  at  equally  spaced  frequencies  with  a  spacing  of 
,  such  that 


* 

P 

(w)  =•  0 


(w) 

0 


«-nu)o  ,  n  integer  , 
otherwise, 


(30) 


where  PQ(w)  is  a  pQ-pole  spectrum.  Pj  (w)  can  be  regarded  as  the 
spectrum  of  a  periodic  signal  that  is  generated  by  applying  a 
periodic  unit  sample  sequence  with  period  x=  —  to  an  all-pole 
filter  whose  magnitude  squared  frequency  response  is  given  by 

P0(“)’  The  question  is'  if  pi(w)  is  our  signal  spectrum,  what 
will  be  the  corresponding  LP  model  spectrum  for  p=pQ?  For  LP 
modeling  in  the  discrete  case  we  compute  the  parameters  a,  from 
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(10),  where  the  autocorrelation  coefficients  are  computed  from 
the  DFT  in  (29)  with  P  (u>n)  replaced  by  P^(nwQ).  For  a  nonzero 

A 

fundamental  w  ,  the  resulting  model  spectrum  P^(w)  will  not  be 
equal  to  PQ(w)  for  p=pQ ,  or  any  other  value  of  p.  This  is 

illustrated  in  Fig.  7a  where  PQ  (u>)  is  the  dashed  curve,  P^(w)  is 

u 

the  line  spectrum  with  Fq=  jj  ~  312  Hz,  and  P^  (w)  is  the  solid 
curve  and  represents  the  LP  spectrum  corresponding  to  P^(w)  for 

A 

p=pQ  (here  po=14).  The  discrepancy  between  P^(w)  and  P0(w)  in 

A 

Fig.  7a  is  obvious.  A  decrease  in  Fq  brings  P^ (w)  closer  to  PQ(w) 
as  in  Fig.  7b.  In  the  limit  as  Fq  approaches  zero  (w^O)  »  P^  (w) 

A 

approaches  Po(w)  and  P^(w)  becomes  identical  to  PQ(w),  as  we 
already  know  from  the  continuous  frequency  case. 

Figures  8  and  9  show  other  examples  of  modeling  spectra  of 
periodic  signals.  The  types  of  discrepancies  that-  can  occur  be¬ 
tween  the  model  and  original  spectra  include  merging  or  splitting 
of  pole  peaks,  and  increasing  or  decreasing  of  pole  frequencies 
and  bandwidths.  In  general,  the  pole  movements  are  in  the  direc¬ 
tion  of  the  nearest  harmonic.  Atal  [11]  has  been  making  quantitative 
measurements  of  these  discrepancies. 

It  is  important  to  note  in  Figs.  7-9  that  the  dashed  curve 
in  each  case  is  the  only  possible  pQ-pole  spectrum  that  coincides 
with  the  line  spectrum  at  the  harmonics.  (In  general  this  is 

I 


BBN  Report  No.  2578 


Bolt  Bc^anek  and  Newman  Inc 


F0  *  312 


mimiiiiiKn 


FREQUENCY  (  KHZ) 


F0  *  1 56 


FREQUENCY  (KHZ) 

Fig.  7.  LP  modeling  of  harmonic  spectra. 

Dashed  curve:  Filter  14-pole  spectrum. 

lines:  Corresponding  harmonic  spectrum  for 
(A)  F0=312  Hz,  and  (B)  FQ=156  Hz. 

Solid  curve:  A  14-pole  fit  to  the  discrete  harmonic 
spectrum. 
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Fig.  9.  LP  modeling  of  harmonic  spectra.  (See  Fig.  7) 
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true  only  if  the  period  t>2po  samples.)  It  is  unfortunate  that 
the  all-pole  spectrum  resulting  from  LP  modeling  does  not  yield 
the  spectrum  we  desire. 

Another  relevant  spectrum  is  that  of  a  single  pitch  period  -t 
let  that  be  Q(u>).  It  is  well  known  that  Q(u)  is  an  all-zero 
spectrum  that-  coincides  with  PQ  (w)  only  at  the  harmonics  nu)Q 
i.e.  Q(nwQ)  =  P^(najQ)  =  PQ  (nc^).  However,  since  Q(u>)  is  otherwise 
not  equal  to  PQ  (u>)  ,  applying  LP  modeling  to  Q(w)  with  p=p0  will 
result  in  an  LP  spectrum  Q(w)  that  is  still  different  from  the 
all-pole  ?o(u>)  and  also  different  from  the  LP  spectrum  P^  (to) 
corresponding  to  the  discrete  spectrum  (w)  ,  i.e.  QU)  jt  P^w)  / 

P0<w>‘ 

It  would  seem  from  the  above  that  LP  analysis  of  periodic 
signals  (especially  those  with  high  fundamental)  is  doomed  to 
be  of  a  very  approximate  nature.  Indeed,  if  nothing  is  known 
about  the  transfer  function  of  the  system,  there  is  a  basic  loss 
of  information  in  the  spectrum  of  the  periodic  signal  that  is 
irrecoverable.  This  is  true  whether  one  uses  linear  prediction 
or  some  other  form  of  analysis.  However,  the  previous  discussion 
shows  that  even  when  we  are  given  the  extra  information  that  the 
system  transfer  function  is  all-pole,  LP  analysis  does  not  seem 
to  be  able  to  recover  that  all-pole  spectrum.  The  reason,  of 
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course,  is  that  nowhere  in  the  analysis  did  we  actually  use  the 
fact  that  PQ(w)  is  all-pole.  For  example,  in  computing  (w) 
from  the  line  spectrum  P^(w),  we  did  not  make  use  of  the  fact 
that  P1(nuQ)  =  P0(nwQ)  and  that  PQ  (w)  is  all-pole.  In  fact, 

LP  analysis  does  not  allow  us  to  use  that  information. 


All  is  not  lost,  however.  The  trick  is  to  use  the  fact 
that  P1(nwo)  =  p0  (nwfl )  to  generate  Pq(u)  for  all  u  ,  and  then 
to  apply  LP  analysis  to  that,  resulting  in  an  LP  spectrum  iden¬ 
tical  to  PQ(w).  In  order  to  generate  all  of  P  (w)  from  P.  (nw  ) 

o  1  o 

we  use  the  important  fact  that  the  autocorrelation  of  an  all¬ 
zero  spectrum  with  pQ  zeros  is  equal  to  zero  for  lags  |k|>p  . 

For  example,  from  (26)  we  see  that  the  autocorrelation  b,  of 

it 

the  all-zero  inverse  filter  A(z)  is  zero  for  |k|>p.  Since  P  (u, ) 
is  all-pole,  its  inverse  P^tw)  is  all-zero.  Let  the  autocorre¬ 
lation  of  P"1^)  be  rk.  Then  rk=G  for  |k|>p  , 


and 


(ui)  = 


(31) 


(32) 


But  since  Pj(nu)Q)  =  P^tnui^)  we  must  have 
Pl1(nw0)  =  * 
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Therefore,  P,1^  )  =  i°  r,  e~^knaj0 

1  0  k-p  k 


y°  _  Q“j2Trkn/T 

. 1  rk  e  •  Osns-t-l  ,  (33) 

k=~p„ 


where  t  is  the  number  of  samples  in  a  pitch  period.  If  we  define 

-  rk  >  [i]  . 


and  r 


then 


x-k  =  rk  ' 


Pl1(nw0)  “  *  rk  e“j27Tkn/'t  ,  0<n5x-l  . 

k=0  K 


But  (35)  is  a  T-point  DFT ,  whose  inverse  is  given  by 

*k  =  x  P^^0)  #  Osksx-1. .  (36) 


Therefore,  from  (36),  (34)  and  (31),  one  can  reconstruct  Pq (w) . 
This  is  done  as  follows: 

1.  Compute  the  inverse  of  the  line  spectrum: 

P^1  (nwQ)  =  l/P1(nujo),  0fn2T-l. 

2.  Compute  the  inverse  DFT  of  p“1(nwQ)  usinq  (36). 

With  (34),  this  yields  the  autocorrelation  function  r^.  )  (37) 

3.  Compute  the  all-zero  spectrum  p"1^)  from  (31)  for  a 

large  number  of  frequencies.  ) 

4.  Compute  P  (w)  =  1/p” 1  (co) . 

0  0 
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If  the  spectrum  Q(w)  of  a  single  pitch  period  is  given,  then  the 
first  thing  to  do  is  to  sample  Q(w>  at  the  harmonics.  This  yields 
the  line  spectrum  Pl(nw0>.  Then  follow  the  procedure  (37)  above 
to  compute  P0(w).  Applying  LP  analysis  to  P0(w)  with  p.po  Uiu 
then  yield  an  LP  spectrum  equal  to  P0(w). 


Above  we  have  shown  how  to  recover  the  complete  all-pole 
spectrum  given  a  finite  number  of  equally  spaced  points  on  it. 

The  only  restriction  is  that  the  number  of  harmonics  in  the  spec¬ 
trum  be  at  least  equal  to  the  number  of  poles .  This  can  be  thought 
of  as  a  method  of  "smoothing"  the  discrete  spectrum.  The  smoothing 
is  done  by  resorting  to  the  autocorrelation  of  the  inverse  spec¬ 
trum.  Thus  we  might  label  this  type  of  smoothing  as  inverse 
autocorrelation  smoothing.  Because  this  method  of  smoothing  is 
based  on  an  all-pole  assumption  for  the  spectrum,  its  application 
to  more  general  cases  has  anticipated  problems .  As  a  simple 
example,  let  us  assume  that  the  given  harmonic  spectrum  is  all¬ 
pole  but  noisy  (e.g.  as  a  result  of  quantization).  This  case 
has  arisen  in  our  experiments  in  speech  compression  (15)  where 
selected  spectral  values  are  used  as  transmission  parameters. 

We  employ  the  procedure  give,,  in  (37)  above  to  recover  the  linear 
prediction  coefficients .  Problems  arise  upon  quantization  of 
the  spectral  values  to  less  than  5  bits.  The  autocorrelation  co- 
efficients  as  computed  from  (36)  lose  their 


positive  definiteness 
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which  results  in  a  smoothed  spectrum  that  is  negative  in  certain 
regions.  This,  in  turn,  results  in  an  unstable  linear  predic¬ 
tion  filter  with  some  poles  outside  the  unit  circle.  There  are 
ways  to  remedy  these  situations  in  a  reasonable  manner  [15],  but 
the  message  is  clear  that  one  should  anticipate  such  problems. 
The  same  problems  arise  if  the  original  spectrum  contains  zeros 
as  well  as  poles.  It  should  be  emphasized,  however,  that  these 
problems  arise  when  the  number  of  harmonics  in  the  spectrum  is 
small,  i.e.  on  the  order  of  the  number  of  poles.  If  the  number 
of  harmonics  is  at  least  twice  the  number  of  poles  the  problems 
are  not  likely  to  arise.  However,  for  those  cases,  regular  LP 
analysis  on  the  line  spectrum  produces  satisfactory  results, 
thus  obviating  the  need  to  use  the  procedure  in  (37)  . 
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VI.  LINEAR  PREDICTION  VS.  ANALYSIS-BY-SYNTHESIS 

An  important  aspect  of  any  fitting  or  matching  procedure 
is  the  properties  of  the  error  measure  that  is  employed,  and 
whether  those  properties  are  commensurate  with  certain  objectives. 
In  the  spectral  analysis  of  speech,  a  common  objective  is  to  have 
the  model  spectrum  P(u>)  approximate  the  envelope  of  the  signal 
power  spectrum  P(«).  In  this  section  we  shall  explore  in  some 
detail  the  properties  of  the  error  measure  used  in  LP  analysis 
and  then  compare  it  to  the  error  measure  used  in  AbS,  always 
using  as  our  criterion  of  goodness  the  ability  of  each  matching 
procedure  to  approximate  the  envelope  of  the  signal  spectrum. 

LP  Error  Measure 

One  important  consideration  in  estimating  the  spectral  en¬ 
velope  is  the  determination  of  an  optimal  value  for  p,  the  num¬ 
ber  of  poles  in  the  model  spectrum.  This  topic  has  been  dis¬ 
cussed  elsewhere  [8,9]  and  we  shall  not  pursue  it  in  this  paper. 
However,  assuming  that  somehow  we  know  this  optimal  value  of  p, 
there  remains  the  question  of  whether  minimization  of  the  error 

measure  in  (5)  will  result  in  a  good  estimate  of  the  spectral 
envelope. 

For  each  value  of  p,  minimization  of  the  error  measure  E 
in  (5)  leads  to  the  minimum  error  Ep  in  (11).  it  can  be  shown 
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[8]  that  Ep  is  also  equal  to 


E  =  eC0 
P 


f 


where 


1  7T 

Cq  =  YtT  f  log 

“IT 


(38) 

(39) 


is  the  zeroth  coefficient  (quefrency)  of  the  cepstrum  correspond- 

A 

ing  to  P(w).  Ep  can  also  be  interpreted  as  the  geometric  mean 
of  the  model  spectrum  P(w).  Ep  decreases  monotonically  as  p 
increases  [8],  and  the  minimum  occurs  as  p-»»  ,  where  P(w)  be¬ 
comes  identical  to  P(w),  and  (38)  reduces  to 


where  cQ  is  obtained  by  subsituting  P(w)  for  P(w)  in  (39).  If 
P(w)  is  a  po-pole  spectrum  then  Ep*Emin  for  all  p>po.  The  abso¬ 
lute  minimum  error  is  a  function  of  P(w)  only,  and  is  equal  to 
its  geometric  mean,  which  is  always  positive  and  usually  non¬ 
zero  for  speech  spectra.  This  is  a  curious  result,  because  it 
says  that  the  minimum  error  can  be  nonzero  even  when  the  matching 

A 

spectrum  P  ( to)  is  identical  to  the  matched  spectrum  P(w).  This 
unusual  property  is  due  to  the  fact  that  the  error  measure  in 
(5)  is  defined  as  the  average  of  the  ratio  of  two  quantities  and 
not  their  difference  as  is  usual  with  most  error  measures  such 
as  the  mean  squared  error. 
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Let  the  ratio  of  P(w)  to  P(w)  be  given  by 

e(cj)  =  pil  . 

P(<*0 

Then  from  (23)  we  have 
1  * 

27  f  L((ii)dw  =  1  ,  for  all  p. 

-7t 


(41) 


(42) 


E(uj)  can  be  interpreted  as  the  "instantaneous  errcr"  between 
P(w)  and  P(ai)  at  frequency  u.  Equation  (42)  says  that  the 
arithmetic  mean  of  E(w)  is  equal  to  1,  which  means  that  there 
are  values  of  E(w)  greater  and  less  than  1  such  that  the  average 
is  equal  to  1.  (Except  for  the  special  case  when  P(to)  is  all- 
P°le»  the  condition  E(ui)  =  l  for  all  to  is  true  only  as  p-**»  .) 

In  terms  of  the  two  spectra,  this  means  that  P(to)  will  be  greater 

A 

than  P(w)  in  some  regions  and  less  in  others  such  that  (42) 
applies.  However,  the  contribution  to  the  total  error  is  more 

a 

significant  when  P(uj)  is  greater  than  P(oo)  than  when  P(to)  is 
smaller,  e.g.  a  ratio  E(uj)=2  (+3dB)  contributes  more  to  the  total 
error  than  a  ratio  of  1/2  (-3dB) .  We  conclude  that,  after  the 
minimization  of  error,  we  expect  a  better  fit  of  P(w)  to  P  (to) 

A 

where  P  (co)  is  greater  than  P  (u>) ,  than  where  P(io)  is  smaller.  For 
example,  if  P  (uj)  is  the  power  spectrum  of  a  quasi-periodic  sig¬ 
nal  (such  as  a  sonorant)  ,  then  most  of  the  energy  in  P(uj)  will 
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exist  in  the  harmonics,  and  very  little  energy  will  reside  be¬ 
tween  harmonics.  The  error  measure  in  (5)  insures  that  the 
approximation  of  P(u)  to  P(ui)  is  far  superior  at  the  harmonics 
where  the  energy  is  greater,  than  between  the  harmonics  where 
there  is  very  little  energy.  Since  P(u>)  is  expected  to  be  a 
smooth  spectrum  (this  is  insured  by  choosing  an  appropriate 

of  p)  ,  we  conclude  that  minimization  of  the  error  measure 
in  (5)  results  in  a  model  spectrum  P(ui)  that  is  a  good  estimate 
of  the  spectral  envelope  of  the  signal  spectrum  P(w).  It  should 
be  clear  from  the  above  that  the  importance  of  the  goodness  of 
the  error  measure  is  not  as  crucial  when  the  variations  of  the 
signal  spectrum  from  the  spectral  envelope  are  much  less  pro¬ 
nounced,  such  as  spectra  of  unvoiced  stops,  spectra  of  single 
pitch  periods,  and  ordinary  filter-bank  spectra. 

Another  important  property  of  this  estimation  procedure  is 
that,  because  the  contributions  to  the  total  error  are  determined 
by  the  ratio  of  the  two  spectra,  the  matching  process  should 
perform  uniformly  over  the  frequency  range  of  interest,  irrespec¬ 
tive  of  the  shaping  of  the  speech  spectral  envelope. 

The  error  measure  E  is  similar  in  its  properties  to  an 
error  measure  used  by  Itakura  and  Saito  [12,13]  in  their  maxi¬ 
mum  likelihood  method  which  results  in  the  same  set  of  equations 
(10).  Their  error  measure  is  also  "more  sensitive  to  the 
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spectral  peaks  uud  less  to  the  dips"  [12] .  They  conclude  that 
for  the  purposes  of  synthesis  this  is  a  good  property  because 
the  ear  is  more  sensitive  to  peaks  than  to  dips  in  the  spectrum 
Itakura  and  Saito  were  not  explicit  in  what  they  meant  by  spec¬ 
tral  peaks  and  dips.  There  are  two  likely  interpretations: 

(1)  The  peaks  correspond  to  harmonic  peaks,  and  the  dips  are 
those  between  the  harmonic  peaks.  (2)  The  peaks  correspond  to 
formants  and  the  dips  are  the  valleys  in  between.  The  second 
interpretation  is  the  one  Flanagan  [14]  gives  in  his  review  cf 
Itakura  and  Saito' s  work.  Flanagan  states  that  "the  minimiza¬ 
tion  results  in  a  fit  which  is  more  sensitive  at  the  spectral 
peaks  than  in  the  valleys  between  the  formants"  [14].  We  be¬ 
lieve  both  interpretations  to  be  correct,  but  under  very  differ 
ent  conditions.  It  all  depends  on  the  number  of  poles  in  the 
model  spectrum.  If  the  number  of  poles  is  less  than  the  neces¬ 
sary  number  to  characterize  all  the  formants  in  the  spectrum 
then  indeed  the  fit  could  be  better  at  the  formant  peaks  than 
in  the  valleys.  On  the  other  hand,  if  the  number  of  poles  is 
greater  than  or  equal  to  the  minimum  number  of  poles  necessary 
to  represent  the  spectral  envelope  as  in  Figs.  1  and  5a,  then 
the  fit  in  the  valleys  between  the  formants  is  just  as  good  as 
the  fit  at  the  formant  ^eaks.  In  this  case,  the  first  interpre 
tation  given  above  is  more  appropriate.  Indeed,  it  is  a  funda¬ 
mental  property  of  the  error  measure  E  in  (5)  that  given  any 
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peaks  and  dips  one  wishes  to  define,  one  can  always  find  a 
value  for  p,  the  number  of  poles,  such  that  the  fit  is  equally 
good  at  the  peaks  and  dips.  In  fact,  we  know  from  (22)  above 
that  as  p—  ,  the  model  spectrum  fits  the  signal  spectrum 
exactly,  all  peaks  and  dips  included.  This,  of  course,  is  also 
true  for  all  p>pQ  if  P(w)  is  a  pQ-pole  spectrum. 

It  is  clear  from  the  above  that  the  number  of  model  spectrum 
poles  plays  a  crucial  role  in  determining  how  the  model  spectrum 
fits  the  sign  >1  spectrum.  Since  interpretations  in  terms  of 
peaks  and  dips  can  be  misleading  if  not  stated  carefully,  we  pre¬ 
fer  to  interpret  the  matching  process  by  the  relation  of  the 
values  of  the  signal  spectrum  P  (w)  relative  to  those  of  the  model 

A 

spectrum  P (w) .  We  merely  state  that,  after  error  minimization, 
the  fit  will  be  better  for  values  of  P(w)>P(w)  than  for  values 

A 

Of  P(w)<P(w).  For  spectral  envelope  estimation  with  an  appro¬ 
priate  number  of  poles,  this  guarantees  us  that  harmonic  peaks 
(P(u>)  >P(u>) )  are  matched  better  than  the  dips  in  between 
(P(w)<P(w)),  resulting  in  a  good  spectral  envelope  match.  For 
purposes  of  synthesis,  a  better  spectral  envelope  fit  results 
in  better  synthesis,  i.e.  a  better  "perceptual  fit". 

Comparison  With  AbS 

In  AbS  (3]  the  error  measure  that  was  normally  used  is  given 
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(in  our  notation)  by: 


E'  ■  /  E'  ( uj ) du> 

U) 


where 


E'  (w)  =  [logP  (ui)-logP  (uj)  ]  2 

2 


log 


P  (uj) 


Lp(w)J 


=  [log  E(oj)] 


(43) 


(44) 


Here  P(w)  is  the  model  spectrum,  E(w)  is  the  ratio  of  the  two 
spectra  as  in  (41),  and  the  integration  in  (43)  is  over  the 
frequency  range  of  interest.  Minimizing  E'  is  equivalent  to 
minimizing  the  mean  squared  error  between  the  two  log  spectra. 
In  contrast  to  the  error  measure  E  in  LP,  here  a  minimum  error 
of  zero  is  possible,  namely  when  the  two  spectra  are  identi¬ 
cal. 


The  error  measures  E  and  E'  in  (5)  and  (43)  are  similar  in 
that  the  contributions  to  the  total  error  are  functions  of 
rati°  t^ie  two  spectra.  We  have  already  mentioned  that 
this  fact  makes  the  matching  process  perform  uniformly  over  the 
frequency  range  of  interest,  however,  the  error  measure  E  in 
LP  spectral  matching  has  two  advantages  over  E':  (a)  For  an  all¬ 

pole  model  spectrum,  the  minimization  of  E  in  (5)  leads  to 
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a  solution  where  the  coefficients  of  the  resulting  P (w)  are 
computed  simply  by  solving  a  set  of  simultaneous  linear  equations, 
while  the  minimization  of  E'  has  to  be  done  iteratively.  (b)  For 
many  cases  of  interest,  F.  is  a  superior  error  measure  to  E'  if  a 
spectral  envelope  is  desired.  This  is  clear  if  one  notes  from 
(44)  that  contributions  to  the  total  error  E'  are  made  equally 
whether  P(u>)>P(u)  or  P(u)<P(u>),  e.g.  a  ratio  E(w)=2  (+3dB)  con¬ 
tributes  equally  to  the  total  error  E'  as  a  ratio  of  1/2  (— 3dB) . 
This  means  that  energy  at  the  harmonics  (in  voiced  sounds)  and 
the  lack  of  energy  between  harmonics  contribute  equally  to  the 
total  error.  This,  of  course,  will  not  lead  to  a  good  spectral 
envelope.  One  can  dramatize  the  difference  between  the  error 
measures  E  and  E'  by  assuming  that  the  signal  spectrum  P(oj)  =  0 
for  some  range  of  frequencies  (no  matter  how  small).  The  ratio 
E  ( u.)  will  be  zero  for  the  same  range,  but  E'  (w)  in  (44)  will  be 
infinite.  The  effect  of  this  range  of  frequencies  on  the  total 
error  is  nil  for  E  and  total  for  E*.  It  is  clear  that  for  cases 
where  the  variations  of  the  signal  spectrum  about  the  spectral 
envelope  are  large,  E  is  a  preferable  measure  of  error  to  E'. 

But  then,  traditional  AbS  methods  have  generally  used  ax- 
ready  smoothed  spectra,  in  which  case  it  is  not  exactly  clear 
which  error  measure  is  to  be  preferred.  For  the  special  case 
when  the  signal  spectrum  is  all-pole  we  know  that  both  LP  and 
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AbS  error  minimization  result  in  a  model  spectrum  that  is  iden¬ 
tical  to  the  signal  spectrum.  (A  salient  difference,  though,  is 
that  the  minimum  error  E'  in  AbS  will  be  zero.)  For  other  smooth 
signal  spectra  there  is  independent  evidence  [15]  that  the  AbS 
error  measure  might  result  in  a  better  spectral  fit.  However, 
for  FFT-generated  spectra  (from  a  time  signal)  we  believe  that 
linear  prediction  will  generally  be  superior  to  AbS. 

Comparison  for  Discrete  Spectra 


Another  point  of  comparison  between  LP  and  AbS  is  in  the 
case  of  discrete  spectra.  This  case  is  of  particular  interest 
because  AbS  techniques  were  largely  applied  to  filter  bank 
spectra.  We  shall  consider  only  two  types  of  spectra  -  harmo¬ 
nic  spectra  and  filter  bank  spectra.  Both  types  of  spectra  will 
be  considereu  to  be  samples  on  a  smooth  spectral  envelope . 


The  definition  of  error  for  AbS  is  obtained  by  replacing  the 
integral  in  (43)  by  a  summation 


N-l 

E 1  =  2  log 

n=0 


P(V 


i  2 


P(“n>J 


(45) 


The  comparison  now  is  between  E'  in  (45)  and  E  in  (28).  The 

absence  of  the  factor  G  /N  in  (45)  is  irrelevant  to  this  discus¬ 
sion. 
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An  example  which  will  put  the  issues  into  focus  is  that 
given  in  Section  V,  where  the  signal  spectrum  is  an  all-pole 
harmonic  spectrum  P, (w)  as  defined  by  (30),  i.e.  the  harmonics 
lie  on  a  p^pole  spectrum  P0 (w> .  We  have  seen  that  LP  analysis 
will  not  result  in  the  desired  envelope  spectrum  Po (w) ,  as  was 
illustrated  in  Figs.  7-9.  On  the  other  hand,  one  lan  show  that 
by  minimizing  E'  in  ,45)  with  P,„n,  =  (nw,) ,  the  model  AbS 

spectrum  will  be  identical  to  PQ (w)  for  p=pQ.  (The  only  possible 

restriction  is  that  the  number  of  harmonics  be  at  least  equal 

to  the  number  of  poles  )  Thi  c  4 „  _i  , 

pores. j  This  is  clear  by  noting  that  the  abso- 

lute  minimum  value  that  E'  in  (45)  can  have  is  zero,  and  this 

occurs  only  when  the  two  spectra  are  equal  at  eacn  frequency  v 

Since  in  tais  example  we  know  that  there  is  a  unique  all-pole  " 

spectrum  PQ  (w)  that  is  equal  to  Pj  (w)  at  each  frequency  Vn.o, 

we  conclude  that  the  all-pole  model  spectrum  Pj (w)  ill  result 

in  an  error  E'-O  ,  and  therefore  must  be  identical  to  Pc(w). 

The  above  example  shows  that  for  modeling  of  all-pole  har¬ 
monic  spectra,  AbS  is  clearly  superior  to  bp.  One  could  argue 
that  for  this  special  case  of  all-pole  harmonic  spectra,  it  is 
possible  to  use  "inverse  autocorrelation  smoothing"  as  described 
in  Section  V  to  recover  the  all-pole  spectrum  so  that  LP  analysis 
will  result  in  the  desired  spectrum.  However,  as  we  pointed  out 
earlier,  this  method  of  smoothing  is  sensitive  to  spectral  noise 
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ani  to  the  existence  of  zeros  in  the  signal  spectrum;  its  use 
is  generally  not  recommended.  We  dc  not  mean  to  imply  in  the 
above  arguments  that  IP  analysis  should  not  be  used  at  ail  with 
harmonic  spectra.  We  merely  point  out  that  Abs  gives  better 
results,  but  at  a  much  higher  computational  cost.  If  the  results 
shown  in  Figs.  7-9  ire  satisfactory  for  the  application  one  has 
in  mind,  then  clearly  IP  analysis  is  to  be  preferred  because  of 
the  lower  cost.  If  more  accurate  results  are  desired  then  one 

mUSt  pay  the  price  intent  in  Abs.  The  same  comments  also  apply 
to  modeling  of  filter  bank  spectra. 

*  m 

The  reader  might  sense  a  contradiction  between  the  above 
conclusions  and  those  made  earlier  in  this  section,  (i)  Earlier 
we  stated  that'  especially  for  the  case  of  spectra  of  voiced 
sounds  where  the  energy  is  mainly  concentrated  around  the  har¬ 
monics,  such  as  in  Fig.  5a,  LP  analysis  is  superior  to  AbS  in 

that  u  results  in  a  bett«  spectral  envelope  fit.  (ii)  on  the 
other  hand,  we  have  shown  above  that  for  the  case  of  harmonic 
spectra,  such  in  Figs.  7-9,  Abs  is  superior  to  LP.  The  contra¬ 
diction  is  only  apparent.  The  two  types  of  harmonic  spectra 
mentioned  above  are  radically  different  in  the  way  they  affect 

*  err°r  The  signal  spectrum  in  Fig.  5a  makes  large 

excursions  from  the  spectral  envelope,  while  these  excursions 
--  are  of  little  ^P^ance  in  LP  error  minimization,  they  are 
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disastrous  to  AbS  error  minimization,  in  contrast,  in  Figs.  7-9, 
only  the  values  at  the  harmonics  are  included  in  the  error,  so 
that  there  are  no  large  excursions  to  upset  AbS  error  minimiza¬ 
tion.  It  is  not  that  LP  does  better  in  case  (i) ,  e.g.  Fig.  5a, 
it  is  that  AbS  does  much  worse.  In  fact,  LP  performs  about  the 
same  in  cases  (i)  and  (ii) .  The  conclusions  concerning  LP  analy¬ 
sis  as  depicted  in  Figs.  7-9  also  apply  to  the  case  in  Fig.  5a. 
The  problem  is  that  if  one  has  to  deal  with  case  (j.)  then  AbS 
does  not  perform  well  and  there  is  little  choice  but  to  use  LP 
analysis.  An  interesting  solution  to  this  problem  is  to  convert 
case  (i)  to  case  (ii)  and  then  apply  AbS  instead  of  LP.  This 
can  be  done  in  Fig.  5a,  for  example,  by  "peak  picking"  the  har¬ 
monics,  i.e.  retain  the  values  only  at  the  harmonic  peaks  and 
discard  all  other  values,  then  apply  AbS  to  the  resulting  line 
spectrum.  That  should  give  better  results  than  straight  LP, 
especially  for  high  fundamentals.  Another  possibility  is  to 
take  the  spectrum  of  a  single  pitch  period,  sample  it  at  the  har¬ 
monics  and  then  use  AbS.  The  main  obstacle,  however,  is  the 
computational  cost  associated  with  AbS.  The  attraction  of  LP 
modeling  is  its  simplicity;  the  price  that  one  pays  is  that  the 
model  spectrum  can  have  only  poles,  and  a  degradation  in  perfor¬ 
mance  is  expected  with  an  increase  in  pitch  frequency. 
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VII.  ALL-ZERO  MODELING 

We  have  seen  in  section  II  that  if  the  model  spectrum  is 
all-pole  then  the  minimization  of  the  LP  error  in  (5)  leads  to 
a  set  of  linear  equations  (10)  which  can  be  easily  solved  for 
the  parameters  of  the  model.  It  is  straightforward  to  show  that 
if  the  model  spectrum  contains  zeros  (with  or  without  poles) ,  then 
the  minimization  of  (5)  leads  to  a  set  of  nonlinear  equations 
whose  solution  is  generally  iterative  and  not  always  readily 
convergent.  Computation-wise  then,  LP  analysis  that  includes 
zeros  in  the  model  offers  no  distinct  advantages  over  AbS. 

However,  if  the  model  spectrum  is  all-zerc  then  the  prob¬ 
lem  can  be  reformulated  such  that  a  suboptimal  solution  can  be 
obtained  noniteratively .  The  idea  is  quite  simple:  Invert  the 
signal  spectrum  and  apply  an  all-pole  LP  analysis,  then  invert 
the  all-pole  LP  spectrum  to  obtain  the  desired  all-zero  model. 

We  shall  call  this  process  inverse  LP  modeling  .  This  solution 
is  clearly  reasonable,  and  on  the  surface  even  seems  to  be  opti¬ 
mal.  Unfortunately,  there  is  a  problem.  Below  we  discuss  this 
problem  and  show  now  to  deal  with  it. 

We  state  again  that  our  purpose  in  spectral  modeling  is  to 
obtain  a  qood  fit  to  the  envelope  of  the  signal  spectrum.  The 
problem  in  the  solution  given  above  is  that,  in  general,  the 
envelope  of  the  inverted  spectrum  is  not  equal  to  the  inverse 
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envelope  of  the  spencrum.  For  example,  if  „e  invert  the  signal 
spectrum  in  Fig.  5a,  then  the  harmonic  peaks  become  valleys  and 
the  valleys  between  tne  harmonics  become  the  new  peaks,  we 
know  that  LP  analysis  on  this  inverted  spectrum  will  follow  these 
new  peaks  whose  envelope  is  not  the  one  we  are  after.  This  prob¬ 
lem  is  not  so  severe  if  the  signal  spectrum  is  smooth  relative 
to  the  order  of  the  model.  For  example,  if  the  signal  spectrum 
consists  of  q  zeros  only,  then  the  above  method  leads  to  the 
correct  solution  for  p-q.  Therefore,  tne  solution  to  our  problem 
is  to  smooth  the  signal  spectrum  before  we  apply  inverse  LP  analy¬ 
sis.  However,  smoothing  introduces  a  certain  amount  of  error. 
Therefore,  inverse  LP  modeling  on  the  smoothed  spectrum  is  only 
a  suboptimal  solution.  The  type  and  degree  of  smoothing  can 

effect  the  final  result  appreciably.  Below  we  discuss  these 
matters  briefly. 

The  degree  to  which  smoothing  is  performed  must  depend  on 
the  order  of  the  model  considered.  For  example,  a  large  amount 
of  smoothing  can  be  tolerated  if  the  order  of  the  model  is  small. 
In  general,  the  simplest  and  perhaps  most  effective  way  to  de¬ 
termine  the  degree  of  smoothing  is  by  inspection  of  the  results. 

There  are  several  types  or  methods  of  spectral  smoothing. 

One  can  apply  a  low  pass  filter  to  the  spectrum  (autocorrelation 
smoothing)  or  to  tne  log  spectrum  (cepstral  smoothing) . 
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Autocorrelation  smoothing  has  been  used  extensively  by  statisti¬ 
cians.  Cepstral  smoothing  is  e  more  recent  development  that 
has  been  employed  in  speech  and  picture  processing.  Another 
method  of  smoo tiling  that  has  become  quite  popular  recently  is 
LP  smoothing.  Indeed,  LP  modeling  can  be  thought  of  as  just 
another  method  of  smoothing  the  spectrum.  The  degree  of  smoothing 
is  controlled  by  the  order  of  the  predictor.  Usually,  the  order 
of  the  predictor  p  is  chosen  to  be  much  larger  than  the  number 
of  zeros  in  the  model  q.  In  this  method,  the  whole  procedure 
IS  as  follows:  (a)  Perform  a  regular  p  pole  LP  analysis  on  the 
signal  spectrum,  where  p>>q.  (b)  Compute  the  corresponding  LP 

spectrum  and  invert  it.  (c)  Perform  a  q-pole  LP  analysis  on  the 
inverted  spectrum.  The  resulting  predictor  coefficients  are  the 
desired  parameters  of  the  all-zero  model. 

We  point  out  that  in  speech  analysis  all-zero  modeling  can 
be  used  to  study  the  spectral  characteristics  of  glottal  pulses. 
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VIII.  CONCLUSIONS 


Linear  predictive  analysis  was  presented  as  a  problem,  in 
spectral  modeling  in  which  the  signal  spectrum  is  modeled  by 
an  all-pole  spectrum  through  the  minimization  of  an  error  measure 


given  by  the  integrated  ratio  of  the  signal  and  model  spectra. 
The  parameters  of  the  all-pole  model  are  obtained  as  the  solu¬ 
tion  of  a  set  of  linear  equations.  The  only  values  needed  for 


the  computation  of  all  p  parameters  are  the  first  p+l  autocorre 
lation  coefficients  which  are  computed  from  the  signal  spectrum 
by  a  simple  Fourier  transform.  Alternatively,  the  autocorrela¬ 
tion  coefficients  can  be  computed  from  the  time  signal,  if  avai- 
lable . 


The  spectral  formulation  leads  to  the  method  of  selective 
linear  prediction  where  selected  portions  of  a  spectrum  can  be 


fitted  by  an  all-pole  spectrum.  This  method  allows  for  arbi¬ 
trary  spectral  shaping  in  tne  frequency  domain,  thus  obviating 
the  need  for  any  special  time  domain  filtering.  In  addition, 
different  portions  of  a  spectrum  can  be  fitted  by  different  num¬ 
bers  of  poles,  a  property  that  is  useful  in  speech  recognition 
applications.  The  method  is  also  applicable  to  linear  predictive 
speech  compression  systems  where  different  sampling  rates  can 
be  simulated  without  the  need  for  sharp  filtering  or  down  sampling 
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LP  analysis  has  also  been  applied  to  the  modeling  of  dis¬ 
crete  spectra,  such  as  harmonic  and  filter  bank  spectra.  It 
was  shown  that  the  modeling  process  has  definite  problems  as 
the  number  of  spectral  lines  decreases,  i.e.  as  the  fundamental 
frequency  increases.  This  has  clear  implications  for  the  analy- 
high  pitched  voices,  such  as  female  and  children  speech. 
For  the  special  case  when  the  harmonic  spectrum  is  a  sampled 
all-pole  spectrum,  we  were  able  to  recover  the  all-pole  spectrum 
by  first  applying  inverse  autocorrelation  smoothing.  However, 
this  method  of  smoothing  was  not  recommended  as  a  general  me hod 
of  dealing  with  the  problems  associated  with  high  fundamentals. 

A  detailed  comparison  was  given  between  LP  modeling  and 
ana ly si s-by- synthesis  <AbS)  in  which  the  error  measure  is  defined 
as  the  average  of  the  square  of  the  difference  between  the  sig¬ 
nal  and  model  log  spectra.  The  two  methods  were  seen  to  have 
two  properties  in  common:  (a)  The  spectral  matching  can  be  done 
selectively  to  any  portion  of  the  spectrum,  and  (b)  both  error 
criteria  are  functions  of  the  ratio  of  the  original  and  model 
spectra,  which  results  in  a  matching  process  that  performs 
uniformly  over  the  frequency  range  of  interest.  For  the  special 
case  of  an  all-pole  model,  LP  analysis  was  seen  to  offer  two 
important  advantages:  (a)  The  computations  for  the  spectral 
parameters  are  straightforward  and  noniterative,  and  (b)  if  the 
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time  signal  is  available  there  is  no  need  to  compute  the  spectrum 
first.  However,  the  major  difference  between  LP  and  AbS  modeling 
is  in  the  quality  of  match  between  the  model  and  signal  spectra. 

If  the  variations  of  the  signal  spectrum  about  the  model  spectrum 
are  large,  then  LP  analysis  is  preferable  to  AbS.  This  is  usually 
the  case  if  the  signal  spectrum  is  FFT-derived  from  a  time  sig¬ 
nal.  However,  if  the  signal  spectrum  is  smooth  relative  to  the 
model  spectrum,  then  AbS  is  expected  to  give  better  results  than 
LP  analysis.  This  occurs  with  filter  bank  spectra  and  cepstrally 
(or  otherwise)  smoothed  spectra. 

Finally,  we  gave  a  suboptimal  solution  to  the  problem  of 
all-zero  modeling  using  LP  analysis.  The  solution  is  simply  to 
apply  all-pole  LP  modeling  to  the  inverted  spectrum.  This,  how¬ 
ever,  requires  that  the  spectrum  be  smoothed  before  inversion. 
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